1 research outputs found

    A semantics aware approach to automated reverse engineering unknown protocols

    Full text link
    Abstract—Extracting the protocol message format specifica-tions of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100 % precision and recall. For SMTP, ProDecoder achieves approximately 95 % precision and recall
    corecore